使用 Monit 搭建 Linux 监控系统

前言

近期服务器不太稳定,于是需要找一些工具检测服务器状态,避免服务挂掉。这里介绍一些服务器检测工具 Monit 的使用。

安装

Monit官网 下载安装包,我这里推荐服务器是 CentOS 7,选择 linux-x64 版本。可以在服务器直接下载,我这里本地电脑下载好以后使用 Transmit 传到服务器。

在服务器中我放在了 /data 路径下,因为关于推荐的文件都放在这里,把监控服务放这里比较好管理。解压文件:

1
2
tar -zvxf monit-5.25.2-linux-x64.tar.gz
mv monit-5.25.2 monit

这时候就可以使用了。

配置

配置 monitrc

首先要配置 monit/conf/monitrc 文件,修改成以下内容,注意这里没有配置邮件发送服务,因为邮件发送需要配置 SMTP Server,暂时不需要就没配置了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
###############################################################################
## Monit control file
###############################################################################
##
## Comments begin with a '#' and extend through the end of the line. Keywords
## are case insensitive. All path's MUST BE FULLY QUALIFIED, starting with '/'.
##
## Below you will find examples of some frequently used statements. For
## information about the control file and a complete list of statements and
## options, please have a look in the Monit manual.
##
##
###############################################################################
## Global section
###############################################################################
##
## Start Monit in the background (run as a daemon):
## 间隔60秒检查服务
set daemon 60 # check services at 30 seconds intervals
# with start delay 240 # optional: delay the first check by 4-minutes (by
# # default Monit check immediately after Monit start)
#
#
## Set syslog logging. If you want to log to a standalone log file instead,
## specify the full path to the log file
## 设置日志地址
set log /data/monit/logs

#
#
## Set the location of the Monit lock file which stores the process id of the
## running Monit instance. By default this file is stored in $HOME/.monit.pid
## 设置monit的启动端口
set pidfile /data/monit/var/monit.pid
#
## Set the location of the Monit id file which stores the unique id for the
## Monit instance. The id is generated and stored on first Monit start. By
## default the file is placed in $HOME/.monit.id.
#
# set idfile /data/monit/var/.monit.id
#
## Set the location of the Monit state file which saves monitoring states
## on each cycle. By default the file is placed in $HOME/.monit.state. If
## the state file is stored on a persistent filesystem, Monit will recover
## the monitoring state across reboots. If it is on temporary filesystem, the
## state will be lost on reboot which may be convenient in some situations.
## 设置monit状态地址
set statefile /data/monit/var/.monit.state
#
#

## Set limits for various tests. The following example shows the default values:
##
# set limits {
# programOutput: 512 B, # check program's output truncate limit
# sendExpectBuffer: 256 B, # limit for send/expect protocol test
# fileContentBuffer: 512 B, # limit for file content test
# httpContentBuffer: 1 MB, # limit for HTTP content test
# networkTimeout: 5 seconds # timeout for network I/O
# programTimeout: 300 seconds # timeout for check program
# stopTimeout: 30 seconds # timeout for service stop
# startTimeout: 30 seconds # timeout for service start
# restartTimeout: 30 seconds # timeout for service restart
# }

## Set global SSL options (just most common options showed, see manual for
## full list).
#
# set ssl {
# verify : enable, # verify SSL certificates (disabled by default but STRONGLY RECOMMENDED)
# selfsigned : allow # allow self signed SSL certificates (reject by default)
# }
#
#
## Set the list of mail servers for alert delivery. Multiple servers may be
## specified using a comma separator. If the first mail server fails, Monit
# will use the second mail server in the list and so on. By default Monit uses
# port 25 - it is possible to override this with the PORT option.
#
# set mailserver mail.bar.baz, # primary mailserver
# backup.bar.baz port 10025, # backup mailserver on port 10025
# localhost # fallback relay
#
#
## By default Monit will drop alert events if no mail servers are available.
## If you want to keep the alerts for later delivery retry, you can use the
## EVENTQUEUE statement. The base directory where undelivered alerts will be
## stored is specified by the BASEDIR option. You can limit the queue size
## by using the SLOTS option (if omitted, the queue is limited by space
## available in the back end filesystem).
#
# set eventqueue
# basedir /var/monit # set the base directory where events will be stored
# slots 100 # optionally limit the queue size
#
#
## Send status and events to M/Monit (for more informations about M/Monit
## see https://mmonit.com/). By default Monit registers credentials with
## M/Monit so M/Monit can smoothly communicate back to Monit and you don't
## have to register Monit credentials manually in M/Monit. It is possible to
## disable credential registration using the commented out option below.
## Though, if safety is a concern we recommend instead using https when
## communicating with M/Monit and send credentials encrypted. The password
## should be URL encoded if it contains URL-significant characters like
## ":", "?", "@". Default timeout is 5 seconds, you can customize it by
## adding the timeout option.
#
# set mmonit http://monit:monit@192.168.1.10:8080/collector
# # with timeout 30 seconds # Default timeout is 5 seconds
# # and register without credentials # Don't register credentials
#
#
## Monit by default uses the following format for alerts if the mail-format
## statement is missing::
## --8<--
## set mail-format {
## from: Monit <monit@$HOST>
## subject: monit alert -- $EVENT $SERVICE
## message: $EVENT Service $SERVICE
## Date: $DATE
## Action: $ACTION
## Host: $HOST
## Description: $DESCRIPTION
##
## Your faithful employee,
## Monit
## }
## --8<--
##
## You can override this message format or parts of it, such as subject
## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.
## are expanded at runtime. For example, to override the sender, use:
#
# set mail-format { from: monit@foo.bar }
#
#
## You can set alert recipients whom will receive alerts if/when a
## service defined in this file has errors. Alerts may be restricted on
## events by using a filter as in the second example below.
#
# set alert sysadm@foo.bar # receive all alerts
#
## Do not alert when Monit starts, stops or performs a user initiated action.
## This filter is recommended to avoid getting alerts for trivial cases.
#
# set alert your-name@your.domain not on { instance, action }
#
#
## Monit has an embedded HTTP interface which can be used to view status of
## services monitored and manage services from a web interface. The HTTP
## interface is also required if you want to issue Monit commands from the
## command line, such as 'monit status' or 'monit restart service' The reason
## for this is that the Monit client uses the HTTP interface to send these
## commands to a running Monit daemon. See the Monit Wiki if you want to
## enable SSL for the HTTP interface.
## 设置web监控,使用localhost的话我们的监控就没有意义了
set httpd port 2812 and
use address 202.107.204.55 # only accept connection from localhost (drop if you use M/Monit)
allow 0.0.0.0/0.0.0.0 # allow localhost to connect to the server and
allow admin:monit # require user 'admin' with password 'monit'
#with ssl { # enable SSL/TLS and set path to server certificate
# pemfile: /etc/ssl/certs/monit.pem
#}

###############################################################################
## Services
###############################################################################
##
## Check general system resources such as load average, cpu and memory
## usage. Each test specifies a resource, conditions and the action to be
## performed should a test fail.
#
# check system $HOST
# if loadavg (1min) > 4 then alert
# if loadavg (5min) > 2 then alert
# if cpu usage > 95% for 10 cycles then alert
# if memory usage > 75% then alert
# if swap usage > 25% then alert
#
#
## Check if a file exists, checksum, permissions, uid and gid. In addition
## to alert recipients in the global section, customized alert can be sent to
## additional recipients by specifying a local alert handler. The service may
## be grouped using the GROUP option. More than one group can be specified by
## repeating the 'group name' statement.
#
# check file apache_bin with path /usr/local/apache/bin/httpd
# if failed checksum and
# expect the sum 8f7f419955cefa0b33a2ba316cba3659 then unmonitor
# if failed permission 755 then unmonitor
# if failed uid "root" then unmonitor
# if failed gid "root" then unmonitor
# alert security@foo.bar on {
# checksum, permission, uid, gid, unmonitor
# } with the mail-format { subject: Alarm! }
# group server
#
#
## Check that a process is running, in this case Apache, and that it respond
## to HTTP and HTTPS requests. Check its resource usage such as cpu and memory,
## and number of children. If the process is not running, Monit will restart
## it by default. In case the service is restarted very often and the
## problem remains, it is possible to disable monitoring using the TIMEOUT
## statement. This service depends on another service (apache_bin) which
## is defined above.
#
# check process apache with pidfile /usr/local/apache/logs/httpd.pid
# start program = "/etc/init.d/httpd start" with timeout 60 seconds
# stop program = "/etc/init.d/httpd stop"
# if cpu > 60% for 2 cycles then alert
# if cpu > 80% for 5 cycles then restart
# if totalmem > 200.0 MB for 5 cycles then restart
# if children > 250 then restart
# if disk read > 500 kb/s for 10 cycles then alert
# if disk write > 500 kb/s for 10 cycles then alert
# if failed host www.tildeslash.com port 80 protocol http and request "/somefile.html" then restart
# if failed port 443 protocol https with timeout 15 seconds then restart
# if 3 restarts within 5 cycles then unmonitor
# depends on apache_bin
# group server
#
#
## Check filesystem permissions, uid, gid, space usage, inode usage and disk I/O.
## Other services, such as databases, may depend on this resource and an automatically
## graceful stop may be cascaded to them before the filesystem will become full and data
## lost.
#
# check filesystem datafs with path /dev/sdb1
# start program = "/bin/mount /data"
# stop program = "/bin/umount /data"
# if failed permission 660 then unmonitor
# if failed uid "root" then unmonitor
# if failed gid "disk" then unmonitor
# if space usage > 80% for 5 times within 15 cycles then alert
# if space usage > 99% then stop
# if inode usage > 30000 then alert
# if inode usage > 99% then stop
# if read rate > 1 MB/s for 5 cycles then alert
# if read rate > 500 operations/s for 5 cycles then alert
# if write rate > 1 MB/s for 5 cycles then alert
# if write rate > 500 operations/s for 5 cycles then alert
# if service time > 10 milliseconds for 3 times within 5 cycles then alert
# group server
#
#
## Check a file's timestamp. In this example, we test if a file is older
## than 15 minutes and assume something is wrong if its not updated. Also,
## if the file size exceed a given limit, execute a script
#
# check file database with path /data/mydatabase.db
# if failed permission 700 then alert
# if failed uid "data" then alert
# if failed gid "data" then alert
# if timestamp > 15 minutes then alert
# if size > 100 MB then exec "/my/cleanup/script" as uid dba and gid dba
#
#
## Check directory permission, uid and gid. An event is triggered if the
## directory does not belong to the user with uid 0 and gid 0. In addition,
## the permissions have to match the octal description of 755 (see chmod(1)).
#
# check directory bin with path /bin
# if failed permission 755 then unmonitor
# if failed uid 0 then unmonitor
# if failed gid 0 then unmonitor
#
#
## Check a remote host availability by issuing a ping test and check the
## content of a response from a web server. Up to three pings are sent and
## connection to a port and an application level network check is performed.
#
# check host myserver with address 192.168.1.1
# if failed ping then alert
# if failed port 3306 protocol mysql with timeout 15 seconds then alert
# if failed port 80 protocol http
# and request /some/path with content = "a string"
# then alert
#
#
## Check a network link status (up/down), link capacity changes, saturation
## and bandwidth usage.
#
# check network public with interface eth0
# if failed link then alert
# if changed link then alert
# if saturation > 90% then alert
# if download > 10 MB/s then alert
# if total uploaded > 1 GB in last hour then alert
#
#
## Check custom program status output.
#
# check program myscript with path /usr/local/bin/myscript.sh
# if status != 0 then alert
#
#
###############################################################################
## Includes
###############################################################################
##
## It is possible to include additional configuration parts from other files or
## directories.
#
# include /etc/monit.d/*
# include的文件就是我们添加的监控项目,新建monit/conf/sys目录,在下面添加conf文件
include /data/monit/conf/sys/*.conf

配置监控进程

详细的配置方法在 Monit 的手册 配置方法 里面可以找到,这里有两个例子:

  1. monit/conf/sys/mount.conf

    如果系统中没有挂载 /dev/sdb ,那就执行挂载命令。

    1
    2
    3
    # mount.conf
    check filesystem data with path /dev/sdb
    if does not exist then exec "/bin/mount /dev/sdb /data"
  2. monit/conf/sys/recommender.conf

    如果没有名为 python server.py 的进程,那就启动。

    1
    2
    3
    4
    # recommender.conf
    check process recommender with MATCHING "python server.py"
    if does not exist then exec "/usr/bin/nohup python /data/Recommender/src_tornado/server/server.py &"
    if changed pid then alert

启动和关闭

这里有几个常用的脚本,放在 Monit 根目录下,即和 bin 同级的目录,注意这里的文件需要使用 chmod 赋予运行权限。

start.sh(启动)

1
2
pwd=$(cd `dirname $0`; pwd)
$pwd/bin/monit -c $pwd/conf/monitrc

stop.sh(关闭)

1
2
#bin/monit -c /data/server/monit/conf/monitrc quit
kill -9 `ps -ef | grep monit/bin/monit | grep -v "grep" | awk '{print $2}' `

restart.sh(重启)

1
2
3
4
kill -9 `ps -ef | grep monit/bin/monit | grep -v "grep" | awk '{print $2}' `
sleep 3
pwd=$(cd `dirname $0`; pwd)
$pwd/bin/monit -c $pwd/conf/monitrc

监控界面

启动以后就可以通过浏览器访问了,我这里配置了 http://localhost:1235/ 就是目标服务器的 2812 端口。通过浏览器访问:

image-20181217172834307

总结

这里仅提到了 Monit 的基础使用,进阶使用还需研究官方文档。

0%